jetson tx2
Easz: An Agile Transformer-based Image Compression Framework for Resource-constrained IoTs
Mao, Yu, Li, Jingzong, Wang, Jun, Xu, Hong, Kuo, Tei-Wei, Guan, Nan, Xue, Chun Jason
Neural image compression, necessary in various machine-to-machine communication scenarios, suffers from its heavy encode-decode structures and inflexibility in switching between different compression levels. Consequently, it raises significant challenges in applying the neural image compression to edge devices that are developed for powerful servers with high computational and storage capacities. We take a step to solve the challenges by proposing a new transformer-based edge-compute-free image coding framework called Easz. Easz shifts the computational overhead to the server, and hence avoids the heavy encoding and model switching overhead on the edge. Easz utilizes a patch-erase algorithm to selectively remove image contents using a conditional uniform-based sampler. The erased pixels are reconstructed on the receiver side through a transformer-based framework. To further reduce the computational overhead on the receiver, we then introduce a lightweight transformer-based reconstruction structure to reduce the reconstruction load on the receiver side. Extensive evaluations conducted on a real-world testbed demonstrate multiple advantages of Easz over existing compression approaches, in terms of adaptability to different compression levels, computational efficiency, and image reconstruction quality.
EdgeMoE: Empowering Sparse Large Language Models on Mobile Devices
Yi, Rongjie, Guo, Liwei, Wei, Shiyun, Zhou, Ao, Wang, Shangguang, Xu, Mengwei
Large language models (LLMs) such as GPTs and Mixtral-8x7B have revolutionized machine intelligence due to their exceptional abilities in generic ML tasks. Transiting LLMs from datacenters to edge devices brings benefits like better privacy and availability, but is challenged by their massive parameter size and thus unbearable runtime costs. To this end, we present EdgeMoE, an on-device inference engine for mixture-of-expert (MoE) LLMs -- a popular form of sparse LLM that scales its parameter size with almost constant computing complexity. EdgeMoE achieves both memory- and compute-efficiency by partitioning the model into the storage hierarchy: non-expert weights are held in device memory; while expert weights are held on external storage and fetched to memory only when activated. This design is motivated by a key observation that expert weights are bulky but infrequently used due to sparse activation. To further reduce the expert I/O swapping overhead, EdgeMoE incorporates two novel techniques: (1) expert-wise bitwidth adaptation that reduces the expert sizes with tolerable accuracy loss; (2) expert preloading that predicts the activated experts ahead of time and preloads it with the compute-I/O pipeline. On popular MoE LLMs and edge devices, EdgeMoE showcase significant memory savings and speedup over competitive baselines. The code is available at https://github.com/UbiquitousLearning/mllm.
A Framework for Controlling Multiple Industrial Robots using Mobile Applications
Alvarado, Daniela, Asif, Dr. Seemal
Purpose: Over the last few decades, the development of the hardware and software has enabled the application of advanced systems. In the robotics field, the UI design is an intriguing area to be explored due to the creation of devices with a wide range of functionalities in a reduced size. Moreover, the idea of using the same UI to control several systems arouses a great interest considering that this involves less learning effort and time for the users. Therefore, this paper will present a mobile application to control two industrial robots with four modes of operation. Design/methodology/approach: The smartphone was selected to be the interface due to its wide range of capabilities and the MIT Inventor App was used to create the application, whose environment is supported by Android smartphones. For the validation, ROS was used since it is a fundamental framework utilised in industrial robotics and the Arduino Uno was used to establish the data transmission between the smartphone and the board NVIDIA Jetson TX2. In MIT Inventor App, the graphical interface was created to visualize the options available in the app whereas two scripts in python were programmed to perform the simulations in ROS and carry out the tests. Findings: The results indicated that the use of the sliders to control the robots is more favourable than the Orientation Sensor due to the sensibility of the sensor and human limitations to hold the smartphone perfectly still. Another important finding was the limitations of the autonomous mode, in which the robot grabs an object. In this case, the configuration of the Kinect camera and the controllers has a significant impact on the success of the simulation. Finally, it was observed that the delay was appropriate despite the use of the Arduino UNO to transfer the data between the Smartphone and the Nvidia Jetson TX2.
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices
Yan, Minghao, Wang, Hongyi, Venkataraman, Shivaram
As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have focused on reducing energy consumption during training, the continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware--elements such as GPU, memory, and CPU frequency, often neglected in prior studies, affects energy consumption for NN inference with regular fine-tuning. We propose PolyThrottle, a solution that optimizes configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner. Our empirical evaluation uncovers novel facets of the energy-performance equilibrium showing that we can save up to 36 percent of energy for popular models.
ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection
Huang, Kai, Yang, Boyuan, Gao, Wei
On-device training is essential for neural networks (NNs) to continuously adapt to new online data, but can be time-consuming due to the device's limited computing power. To speed up on-device training, existing schemes select trainable NN portion offline or conduct unrecoverable selection at runtime, but the evolution of trainable NN portion is constrained and cannot adapt to the current need for training. Instead, runtime adaptation of on-device training should be fully elastic, i.e., every NN substructure can be freely removed from or added to the trainable NN portion at any time in training. In this paper, we present ElasticTrainer, a new technique that enforces such elasticity to achieve the required training speedup with the minimum NN accuracy loss. Experiment results show that ElasticTrainer achieves up to 3.5x more training speedup in wall-clock time and reduces energy consumption by 2x-3x more compared to the existing schemes, without noticeable accuracy loss.
Inception Spotlight: New Skydio 2 Drone Powered by NVIDIA Jetson GPUs Can Track up to 10 Objects at a Time - NVIDIA Developer News Center
Redwood City, California-based Skydio and member of NVIDIA's startup accelerator, Inception, has just released the latest version of their AI capable GPU-accelerated drone, Skydio 2. Comprised of six 4K cameras, with an NVIDIA Jetson TX2 as the processor for the autonomous system, Skydio 2 is capable of flying for up to 23 minutes at a time and can be piloted by either an experienced pilot or by the AI-based system. The Jetson TX2 has 256 GPU cores and is capable of 1.3 trillion operations a second. According to the team, the drone uses nine custom deep neural networks that help the drone track up to 10 objects while traveling at speeds of 36 miles per hour. "Skydio 2 enables you to capture everything from a backyard pickup game to a downhill adventure with a single tap, the company wrote in blog post. "It builds on Skydio R1's foundation and takes it to the next level."
AI Helps Protect Taiwan's Endangered Leopard Cats NVIDIA Blog
There's no mistaking why the leopard cat of Taiwan got its name. While only about the size of domestic felines, it sports a beautiful, flower-spotted pattern on its fur. There's also no debate about why the leopard cat, the only remaining native wild cat species in Taiwan, is on the edge of extinction. Fewer than 500 of the leopard cats live in a natural habitat that overlaps with many development projects in the central regions of the island. In an otherwise rural area, the cats are often victims of roadkill due to increased traffic.
ECC: Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model
Yang, Haichuan, Zhu, Yuhao, Liu, Ji
Many DNN-enabled vision applications constantly operate under severe energy constraints such as unmanned aerial vehicles, Augmented Reality headsets, and smartphones. Designing DNNs that can meet a stringent energy budget is becoming increasingly important. This paper proposes ECC, a framework that compresses DNNs to meet a given energy constraint while minimizing accuracy loss. The key idea of ECC is to model the DNN energy consumption via a novel bilinear regression function. The energy estimate model allows us to formulate DNN compression as a constrained optimization that minimizes the DNN loss function over the energy constraint. The optimization problem, however, has nontrivial constraints. Therefore, existing deep learning solvers do not apply directly. We propose an optimization algorithm that combines the essence of the Alternating Direction Method of Multipliers (ADMM) framework with gradient-based learning algorithms. The algorithm decomposes the original constrained optimization into several subproblems that are solved iteratively and efficiently. ECC is also portable across different hardware platforms without requiring hardware knowledge. Experiments show that ECC achieves higher accuracy under the same or lower energy budget compared to state-of-the-art resource-constrained DNN compression techniques.
Popping Big Data Fallacies On the Edge
Organizations today are drowning in data. But there continues to be vigorous debate on the best way to deal with that data. While some advocate creating big data lakes to store data that will subsequently be used for training machine learning models, there's a growing chorus of voices calling for a simpler and more real-time approach. You can count Simon Crosby, CTO of SWIM.ai, among proponents for a lighter-weight and less expensive approach to data collection and analysis, at least for a certain class of real-world machine learning problems at the edge. During a recent conversation with Datanami, Crosby threw cold water on the notion that uploading data to the cloud for storage and machine learning was the best way to get value out of the morasses of data created on edge devices.
TensorFlow Gains Hardware Support
There are a number of machine learning (ML) architectures that utilize deep neural networks (DNNs), including AlexNet, VGGNet, GoogLeNet, Inception, ResNet, FCN, and U-Net. These in turn run on frameworks like Berkeley's Caffe, Google's TensorFlow, Torch, Microsoft's Cognitive Toolkit (CNTK), and Apache's mxnet. Of course, support for these frameworks on specific hardware is required to actually run the ML applications. Each framework has advantages and disadvantages. For example, Caffe is an easy platform to start with, especially since ones of its popular uses is image recognition.